Contributions to the Asymptotic Minimax Theorem for the Two-Armed Bandit Problem
نویسنده
چکیده
The asymptotic minimax theorem for Bernoully twoarmed bandit problem states that the minimax risk has the order N as N → ∞, where N is the control horizon, and provides lower and upper estimates. It can be easily extended to normal two-armed bandit. For normal two-armed bandit, we generalize the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N →∞. Keywords—two-armed bandit problem, control in a random environment, minimax and bayesian approaches, an asymptotic minimax theorem, parallel processing.
منابع مشابه
UDC 519.244.3 An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
The asymptotic minimax theorem for Bernoulli two-armed bandit problem states that minimax risk has the order N as N → ∞, where N is the control horizon, and provides the estimates of the factor. For Gaussian twoarmed bandit with unit variances of one-step incomes and close expectations, we improve the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N ...
متن کاملMinimax Lower Bounds for the Two-Armed Bandit Problem
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a nite-sample minimax version of the well-known log n asymptotic lower bound of Lai and Robbins. Also, in contrast to the logn asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable con gurations o...
متن کاملFinite-time lower bounds for the two-armed bandit problem
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a finite-sample minimax version of the well-known log asymptotic lower bound of Lai and Robbins. The finite-time lower bound allows us to derive conditions for the amount of time necessary to make any significant gain over a random guessing strategy. These bounds depend on the class of possible d...
متن کاملWoodroofe ’ S One - Armed Bandit Problem Revisited
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799–806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal police...
متن کامل